Adaptive context trees and text clustering
نویسندگان
چکیده
منابع مشابه
Adaptive context trees and text clustering
In the finite-alphabet context we propose four alternatives to fixed-order Markov models to estimate a conditional distribution. They consist in working with a large class of variablelength Markov models represented by context trees, and building an estimator of the conditional distribution with a risk of the same order as the risk of the best estimator for every model simultaneously, in a cond...
متن کاملText Categorization Using Adaptive Context Trees
A new way of representing texts written in natural language is introduced, as a conditional probability distribution at the letter level learned with a variable length Markov model called adaptive context tree model. Text categorization experiments demonstrates the ability of this representation to catch information about the semantic content of the text.
متن کاملContext - Dependent Conflation , Text Filtering and Clustering
The presence of trivial words in text databases can impact record or concept (words/ phrases) clustering adversely. Additionally, the determination of whether a word/ phrase is trivial is context-dependent. The objective of the present paper is to demonstrate a context-dependent trivial word filter to improve clustering quality. Factor analysis was used as a context-dependent trivial word filte...
متن کاملAnnotated Suffix Trees for Text Clustering
In this paper an extension of tf -idf weighting on annotated suffix tree (AST) structure is described. The new weighting scheme can be used for computing similarity between texts, which can further serve as in input to clustering algorithm. We present preliminary tests of using AST for computing similarity of Russian texts and show slight improvement in comparison to the baseline cosine similar...
متن کاملAdaptive channel equalization using context trees
The maximum likelihood sequence estimator is the optimal receiver for the inter-symbol interference (ISI) channel with additive white noise. A receiver is demonstrated that estimates sequence likelihood using a variable order Markov model constructed from a crudely quantized training sequence. Receiver performance is relatively unaffected by heavy-tailed noise that can undermine the performance...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Information Theory
سال: 2001
ISSN: 0018-9448
DOI: 10.1109/18.930925